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METHOD AND APPARATUS FOR PERFORMING 
MACHINE TRANSLATION USING A UNIFIED 
LANGUAGE MODEL AND TRANSLATION MODEL 

BACKGROUlSfD OF THE INVENTION 
5 The present invention relates to machine 

translation of languages. More specifically, the 
present invention relates to phrase translation of 
languages using a unified language and translation 
model . 

10 Machine translation involves a computer 

receiving input text either in written form, or in 
the form of speech, or in another suitable machine- 
readable form. The machine may typically use a 
statistical translation model in order to translate 

15 the words in the input text from a first language (in 
which they are input) to a second, desired language. 
The translation is then output by the machine 
translator. 

Previous methods of machine translation can 
20 roughly be classified into two categories. The first 
category includes rule-based translators. These 
translators receive input text and apply rules to the 
input text in order to arrive at a translation from a 
first language to a second language. However, such 
25 rule-based systems suffer from a number of 
disadvantages. For example, such systems are 

relatively slow, and exhibit low robustness. 

The second category of prior machine 
translation systems includes statistically based 
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systems. Such systems use statistical models in an 
attempt to translate the words in the input from a 
first language to a second language. However, 
statistical models also suffer from certain 
5 disadvantages. For example, such models often suffer 
because they largely ignore structural information in 
performing the translation. This has resulted in 
poor translation quality. 

SUMMARY OF THE INVENTION 

10 The present invention is a method and 

apparatus for processing a phrase in a first language 
for translation to a second language. A plurality of 
possible linguistic patterns are identified in the 
second language, that correspond to the phrase in the 

15 first language. For each of the patterns identified, 
a probability for the pattern is calculated, based on 
a combination of the language model probability for 
the pattern and a translation model probability for 
the pattern. In one embodiment, an output is also 

20 provided which is indicative of a translation of the 
phrase in the first language to the second language 
based upon the translation probabilities calculated 
for the patterns. 

In one embodiment, a highest translation 

25 probability is identified and a linguistic pattern, 
for which the highest translation probability was 
calculated, is identified as being indicative of a 
likely phrase translation of the phrase in the first 
language . 
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The present invention can also be 
implemented as an apparatus which includes a pattern 
engine that receives a phrase in the first language 
and identifies a plurality of linguistic patterns in 
5 the second language which possibly correspond to a 
translation of the phrase from the first language to 
the second language. The apparatus also includes a 
probability generator configured to generate, for 
each linguistic pattern identified, a translation 
10 probability for translating the phrase in the first 
language to the second language in the linguistic 
pattern. 

The apparatus may further include a bi- 
lingual data store storing phrases in the first 

15 language and corresponding linguistic patterns in the 
second language. In addition, the probability 

generator illustratively includes a translation 
model, such that the probability generator is 
configured to generate the translation probability by 

20 accessing the translation model. The probability 
generator illustratively further includes a language 
model in the second language, such that the 
probability generator is configured to generate the 
translation probability by accessing the language 

25 model as well. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a block diagram of an 
illustrative environment in which the present 
invention can be practiced. 
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Figure 2 is a more detailed block diagram 
of a machine translator in accordance with one 
feature of the present invention. 

Figure 3 is a flow diagram illustrating the 
5 operation of the machine translator shown in Figure 
4. 

Figures 4 A and 4B illustrate one embodiment 
of linguistic patterns. 

Figure 5 is a flow diagram further 
10 illustrating calculation of the translation 
probability, 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

FIG. 1 illustrates an example of a suitable 
comput ing system environment 100 on which the 

15 invention may be implemented. The computing system 
environment 100 is only one example of a suitable 
computing environment and is not intended to suggest 
any limitation as to the scope of use or 
functionality of the invention. Neither should the 

20 computing environment 100 be interpreted as having 
any dependency or requirement relating to any one or 
combination of components illustrated in the 
exemplary operating environment 100. 

The invention is operational with numerous 

25 other general purpose or special purpose computing 
system environments or configurations. Examples of 
well known computing systems, environments, and/or 
configurations that may be suitable for use with the 
invention include, but are not limited to, personal 

3 0 computers, server computers, hand-held or laptop 



devices , multiprocessor systems , microprocessor-based 
systems, set top boxes, programmable consumer 
electronics, network PCs, minicomputers, mainframe 
computers, distributed computing environments that 
include any of the above systems or devices, and the 
like . 

The invention may be described in the 
general context of computer-executable instructions, 
such as program modules, being executed by a 
computer. Generally, program modules include 

routines, programs, objects, components, data 
structures, etc. that perform particular tasks or 
implement particular abstract data types. The 
invention may also be practiced in distributed 
computing environments where tasks are performed by 
remote processing devices that are linked through a 
communications network. In a distributed computing 
environment, program modules may be located in both 
local and remote computer storage media including 
memory storage devices. 

With reference to FIG. 1, an exemplary 
system for implementing the invention includes a 
general purpose computing device in the form of a 
computer 110. Components of computer 110 may 

include, but are not limited to, a processing unit 
12 0, a system memory 13 0, and a system bus 121 that 
couples various system components including the 
system memory to the processing unit 120. The system 
bus 121 may be any of several types of bus structures 
including a memory bus or memory controller, a 
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peripheral bus, and a local bus using any of a 
variety of bus architectures. By way of example, and 
not limitation, such architectures include Industry 
Standard Architecture (ISA) bus, Micro Channel 
5 Architecture (MCA) bus. Enhanced ISA (EISA) bus, 
Video Electronics Standards Association (VESA) local 
bus, and Peripheral Component Interconnect (PCI) bus 
also known as Mezzanine bus. 

Computer 110 typically includes a variety 

10 of computer readable media. Computer readable media 
can be any available media that can be accessed by 
computer 110 and includes both volatile and 
nonvolatile media, removable and non-removable media. 
By way of example, and not limitation, computer 

15 readable media may comprise computer storage media 
and communication media. Computer storage media 
includes both volatile and nonvolatile, removable and 
non- removable media implemented in any method or 
technology for storage of information such as 

20 computer readable instructions, data structures, 
program modules or other data. Computer storage 
media includes, but is not limited to, RAM, ROM, 
EEPROM, flash memory or other memory technology, CD- 
ROM, digital versatile disks (DVD) or other optical 

25 disk storage, magnetic cassettes, magnetic tape, 
magnetic disk storage or other magnetic storage 
devices, or any other medium which can be used to 
store the desired information and which can be 
accessed by computer 100. Communication media 

3 0 typically embodies computer readable instructions. 
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data structures, program modules or other data in a 
modulated data signal such as a carrier WAV or other 
transport mechanism and includes any information 
delivery media. The term "modulated data signal'' 
5 means a signal that has one or more of its 
characteristics set or changed in such a manner as to 
encode information in the signal. By way of example, 
and not limitation, communication media includes 
wired media such as a wired network or direct -wired 

10 connection, and wireless media such as acoustic, FR, 
infrared and other wireless media. Combinations of 
any of the above should also be included within the 
scope of computer readable media. 

The system memory 13 0 includes computer 

15 storage media in the form of volatile and/or 
nonvolatile memory such as read only memory (ROM) 131 
and random access memory (RAM) 132. A basic 
input/output system 133 (BIOS) , containing the basic 
routines that help to transfer information between 

2 0 elements within computer 110, such as during start- 

up, is typically stored in ROM 131. RAM 132 
typically contains data and/or program modules that 
are immediately accessible to and/or presently being 
operated on by processing unit 12 0. By way o 
25 example, and not limitation, FIG. 1 illustrates 
operating system 134, application programs 135, other 
program modules 136, and program data 137. 

The computer 110 may also include other 
removable/non- removable volatile/nonvolatile computer 

3 0 storage media. By way of example only, FIG. 1 
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illustrates a hard disk drive 141 that reads from or 
writes to non- removable, nonvolatile magnetic media, 
a magnetic disk drive 151 that reads from or writes 
to a removable, nonvolatile magnetic disk 152, and an 
5 optical disk drive 155 that reads from or writes to a 
removable, nonvolatile optical disk 156 such as a CD 
ROM or other optical media. Other removable/non- 
removable, volatile/nonvolatile computer storage 
media that can be used in the exemplary operating 

10 environment include, but are not limited to, magnetic 
tape cassettes, flash memory cards, digital versatile 
disks, digital video tape, solid state RAM, solid 
state ROM, and the like. The hard disk drive 141 is 
typically connected to the system bus 121 through a 

15 non-removable memory interface such as interface 140, 
and magnetic disk drive 151 and optical disk drive 
155 are typically connected to the system bus 121 by 
a removable memory interface, such as interface 150. 

The drives and their associated computer 

20 storage media discussed above and illustrated in FIG. 
1, provide storage of computer readable instructions, 
data structures, program modules and other data for 
the computer 110. In FIG. 1, for example, hard disk 
drive 141 is illustrated as storing operating system 

25 144, application programs 145, other program modules 
146, and program data 147, Note that these 

components can either be the same as or different 
from operating system 134, application programs 135, 
other program modules 136, and program data 137. 

3 0 Operating system 144, application programs 145, other 
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program modules 146, and program data 147 are given 
different numbers here to illustrate that, at a 
minimum, they are different copies. 

A user may enter commands and information 
5 into the computer 110 through input devices such as a 
keyboard 162, a microphone 163, and a pointing device 
161, such as a mouse, trackball or touch pad. Other 
input devices (not shown) may include a joystick, 
game pad, satellite dish, scanner, or the like. 
Q 10 These and other input devices are often connected to 

Ci the processing unit 120 through a user input 

^fl interface 160 that is coupled to the system bus, but 

m may be connected by other interface and bus 

;J structures, such as a parallel port, game port or a 

H 15 universal serial bus (USB) . A monitor 191 or other 

type of display device is also connected to the 
!f system bus 121 via an interface, such as a video 

□ interface 190. In addition to the monitor, computers 

may also include other peripheral output devices such 
20 as speakers 197 and printer 196, which may be 
connected through an output peripheral interface 190. 

The computer 110 may operate in a networked 
environment using logical connections to one or more 
remote computers, such as a remote computer 180. The 
25 remote computer 180 may be a personal computer, a 
hand-held device, a server, a router, a network PC, a 
peer device or other common network node, and 
typically includes many or all of the elements 
described above relative to the computer 110. The 
3 0 logical connections depicted in FIG. 1 include a 
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local area network (LAN) 171 and a wide area network 
(WAN) 173, but may also include other networks. Such 
networking environments are commonplace in offices, 
enterprise-wide computer networks, intranets and the 
5 Internet . 

When used in a LAN networking environment, 
the computer 110 is connected to the LAN 171 through 
a network interface or adapter 170. When used in a 
WAN networking environment, the computer 110 

10 typically includes a modem 172 or other means for 
establishing communications over the WAN 173, such as 
the Internet. The modem 172, which may be internal 
or external, may be connected to the system bus 121 
via the user input interface 160, or other 

15 appropriate mechanism. In a networked environment, 
program modules depicted relative to the computer 
110, or portions thereof, may be stored in the remote 
memory storage device. By way of example, and not 
limitation, FIG. 1 illustrates remote application 

20 programs 185 as residing on remote computer 180. It 
will be appreciated that the network connections 
shown are exemplary and other means of establishing a 
communications link between the computers may be 
used. 

25 Figure 2 is a more detailed block diagram 

of a machine translator 200 in accordance with one 
embodiment of the present invention. System 200 
illustratively receives a phrase 202 in a first 
language and provides an output 204 which is 

30 indicative of a translation of phrase 202 into a 
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second language. Translator 200 illustratively has 
access to a bi- lingual data corpus 206 and second 
language corpus 208. Translator 200 also 

illustratively has access to bi-lingual pattern data 
5 store 210. Further, translator 210, itself, 

illustratively includes probability generator 212 and 
translator component 214 . Probability generator 212 
illustratively includes a translation model 216, a 
pattern probability model 218 and a language model 

10 for the second language 220. 

While the translation system of the present 
invention can be described with respect to 
translating between substantially any two languages, 
the present invention will be described herein, for 

15 exemplary purposes only, as translating from an 
English input phrase to a Chinese output phrase. 
Therefore, phrase 202 is illustratively a phrase in 
the English language and output 204 is illustratively 
some indication as to the translation of phrase 202 

20 into Chinese. 

In one illustrative embodiment, bi-lingual 
pattern data store 210 is illustratively trained by 
accessing bi-lingual corpus 206. In other words, 
different linguistic patterns in Chinese can be 

25 identified for any given phrase in English. 

More specifically, bi-lingual corpus 206 
illustratively includes both a large Chinese language 
corpus and a large English language corpus. Bi- 
lingual pattern data store 210 is trained based on 

30 bi-lingual corpus 206 and includes a plurality of 
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Chinese linguistic patterns which can correspond to a 
given English phrase. 

Second language corpus 208 is 
illustratively a large Chinese text corpus. Of 
5 course, second language corpus 208 can be the Chinese 
portion of bi-lingual corpus 206, or a separate 
corpus. Language model 220 is illustratively trained 
based upon the second language corpus 208. Language 
model 220 is illustratively a conventional language 

10 model (such as a tri-gram language model) which 
provides the probability of any given Chinese word, 
given its history. Specifically, in the tri-gram 
embodiment, language model 220 provides the 
probability of a Chinese word given the two previous 

15 words in the phrase under analysis. 

Pattern probability model 218 is a model 
which generates the probability of any given 
linguistic pattern in the second language (for the 
sake of this example, in the Chinese language) . 

20 Translation model 216 can be any suitable translation 
model which provides a probability of translation of 
a word in the first language (e.g. English) to a word 
in the second language (e.g. Chinese) . In the 
illustrative embodiment, translation model 216 is the 

25 well-known translation model developed by 
International Business Machines, of Armonk, New York, 
and is discussed in greater detail below. 

Translator component 214 receives the 
probabilities generated by probability generator 212 

30 and provides an indication as to a translation of the 
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English phrase 202 into a Chinese phrase 204, Of 
course, translator component 214 can be part of 
probability generator 212, or can be a separately- 
operable component. 
5 Figure 3A is a flow diagram which 

illustrates in more detail the general operation of 
translation 200 shown in Figure 3, First, translator 
200 receives the input phrase 202 in the first 
language (for purposes of this example, the English 
10 language) , This is indicated by block 23 0 in Figure 
3A. 

Pattern probability model 218 then obtains 
a plurality of possible linguistic patterns 232 
associated with the input phrase from bi- lingual 

15 pattern data store 210. This is indicated by block 
234 in Figure 3A. In other words. Figures 4A and 4B 
better illustrate different patterns which can be 
assigned to a phrase in a first language . Figure 4A 
shows a tree for an English phrase (represented by 

20 "E") , The nodes D and E on the tree in Figure 4A are 
non- terminal nodes, while the nodes A, B and C 
represent terminal, or leaf nodes, and thus, 
represent the individual words in phrase E. It can 
be seen from Figure 4A that the phrase E is composed 

25 of a non- terminal phrase D and the English word C. 
The phrase D is composed of the two English words A 
and B. 

Figure 4B illustrates the wide variety of 
linguistic patterns that can be used in translating 
30 the phrase E. Those phrases are identified by 
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numerals 300, 302, 304, 306, 308 and 310. Linguistic 
pattern 300 illustrates that the translation of 
phrase E can be formed by translating the phrase D 
followed by a translation of the word C. Linguistic 
5 pattern 302 indicates that the translation of phrase 
E can be composed of a translation of the word C 
followed by a translation of the phrase D. Of 
course, since phrase D is actually made up of two 
words (A and B) translation of phrase D can also be 

10 performed by translating the word A and following it 
with the translation of the word B, or vice versa. 
This is indicated by patterns 304 and 306. Patterns 
308 and 310 s how t he s ame t yp e of 1 i ngu i s t i c 
patterns, except where the expanded translation of 

15 the phrase D follows translation of the word C. 

Therefore, bi- lingual pattern data store 
210 illustratively includes a plurality of English 
phrases (such as phrase E) followed by a 
corresponding plurality of linguistic patterns in the 

20 second language (such as the linguistic pattern set 
out in Figure 4B) which correspond to, and are 
possible linguistic translation patterns of, the 
English phrase E. In step 234 in Figure 3A, pattern 
probability model 218 retrieves those patterns 

25 (referred to as patterns 232) from bi-lingual pattern 
data store 210, based on the English input phrase E. 

Probability generator 212 then selects one 
of the linguistic patterns 232 as indicated by block 
236 in Figure 3A. Probability generator 212 then 

3 0 generates a translation probability for the selected 
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linguistic pattern. As will be described in greater 
detail later with respect to Figure 5, the 
translation probability is a combination of 
probabilities generated by pattern probability model 
5 218, translation model 216 and language model 220, 
The combined translation probability is then provided 
by probability generator 212 to translator component 
214 . Calculation of the translation probability is 
indicated by block 238 in Figure 3A. 

10 Probability generator 212 then determines 

whether there are any additional patterns for the 
English phrase E for which a translation probability 
must be generated. This is indicated by block 240. 
If additional linguistic patterns exist, processing 

15 continues at block 236. However, if no additional 
linguistic patterns exist, for which a translation 
probability has not been calculated, probability 
generator 212 provides the combined probabilities for 
each of the plurality of patterns at its output to 

20 translator component 214. This is indicated by block 
242 in Figure 3A. 

It will be noted, of course, that the 
output from probability generator 212 can be done as 
each probability is generated. In addition, 

25 probability generator 212 can optionally only provide 
at its output the linguistic pattern associated with 
the highest translation probability. However, 
probability generator 212 can also provide the top N- 
best linguistic patterns, based on the translation 

30 probability, or it can provide all linguistic 
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patterns identified, and their associated translation 
probabilities, ranked in the order of the highest 
translation probability first, or in any other 
desired order. 

5 Once translator component 214 receives the 

linguistic patterns and the associated translation 
probabilities, it provides, at its output, an 
indication of the translation of the English phrase E 
into the second language (in this case, the Chinese 
a 10 language) . This is indicated by block 244 in Figure 

;j1 3A. Again, the output from translator component 214 

-1 can be done in one of a wide variety of ways. It can 

iT| provide different translations, ranked in order of 

their translation probabilities, or it can provide 
15 only the best translation, corresponding to the 
highest translation probability calculated, or it can 
provide any combination or other desired outputs. 
Q Figure 5 is a flow diagram illustrating the 

calculation of the translation probability 
20 (illustrated by block 23 8 in Figure 3A) in greater 
detail. Figure 5 illustrates that pattern 

probability model 218 calculates the pattern 
probability associated with the selected pattern. 
This is indicated by block 246. Figure 5 also shows 
25 that language model 220 calculates the language model 
probability for the second language, given terms in 
the selected pattern. This is indicated by block 248 
in Figure 5. Figure 5 further shows that translation 
model 216 calculates the translation model 
3 0 probability for the English language phrase given the 



-17- 

terms in the Chinese language phrase and the selected 
pattern. This is indicated by block 250 in Figure 5. 
Finally, a combined probability is calculated for 
each linguistic pattern, as the translation 
5 probability, based upon the pattern probability, the 
language model probability and the translation model 
probability. This is performed by probability 

generator 212 and is indicated by block 252 in Figure 
5. The discussion now proceeds with respect to 
10 deriving the overall phrase translation probability 
based upon the three probabilities set out in Figure 
5, 

For the following discussion, let "e" 
represent an English phrase containing "n" words, and 

15 let "Wi" represent the "ith" word in the phrase. Let 
"c" represent the Chinese translation of the English 
phrase " e " , and let "patterns " represent the related 
linguistic phrase translation patterns which 
correspond to the English phrase "e". The present 

20 statistical model is based on the overall probability 
of the Chinese phrase "c", given the English phrase 
"e" as follows: 

i ^ P(pattem\e)xP(c\ pattern, e) 

P(c\e) = -~^ , , — Eg. 1 

P{patern \ c,e) 

Also, assume that : 
25 P{patern\c,e) = l Eq. 2 

then, from Bayes law: 

r,. , . P{e\c,pattern)xP(c\ pattern) 

P(c I pattern, e) = ^ ' ^ — ^^-^ Eq , 3 

P{e I pattern) 
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and further assume that 

P{e I pattern) = 1 , Eq . 4 

Let 

P{pattern\e) be referred to as the pattern 
5 probability, or the probability of generating a given 
Chinese linguistic pattern, given the English input 
text, and let 

P{c\ pattern) be called the Chinese Statistical 
Language Model, in other words, the probability of 
10 the Chinese translation "c" given the linguistic 
"pattern" ; and let 

P(e\c, pattern) be called the Translation Model, 
which represents the probability of generating the 
phrase "e" given the Chinese translation "c" and the 
15 pattern "pattern". 

Further, we make the following two 
assumptions. First, a two-order hidden Markov Model 
is used and second, an assumption of independence is 
made between the hidden Markov Models and the 
20 probability set out in P(pattern\e) , 

Then, simplifying the above equations, the 
following probability of generating the Chinese 
translation "c" given the English language phrase "e" 
is given by: 

25 P(c\e) = Yl P(p^ttern)xYl pic^ \ c,_2 , c,._j ))xP{ew | c. ) Eq . 5 

i=\,m i=\,n 

Therefore, the problem of performing the 
machine translation is transferred into a search 
problem, as follows: 
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Eq. 6 

phrase translation = argmax(]^ p(pattern)xY[ pic^ \ c,_2,c^-i))^i'(ew | 

where "m" is the number of linguistic patterns used 
5 in the phrase translation, "h" is the context, there 
are "n" characters in the proposed Chinese 
translation, and "ew" represents a given word in the 
English phrase. 

It can thus be seen that Equation 9 

10 indicates that, for each linguistic pattern 
identified as being a possible linguistic pattern 
corresponding to a translation of the input English 
text, both the language model probability and the 
translation model probability are applied. This 

15 provides a unified probability that not only includes 
statistical information, but structural and 
linguistic information as well. This leads to 
structural information being reflected in the 
statistic translation model and leads to an 

2 0 improvement in the quality of the machine translation 
system. 

Although the present invention has been 
described with reference to particular embodiments, 
workers skilled in the art will recognize that 
25 changes may be made in form and detail without 
departing from the spirit and scope of the invention. 



